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Abstract — We present DCOOL-NET, a scalable distributed 
in-network algorithm for sensor network localization based on 
noisy range measurements. DCOOL-NET operates by parallel, 
collaborative message passing between single-hop neighbor sen- 
sors, and involves simple computations at each node. It stems 
from an application of the majorization-minimization (MM) 
framework to the nonconvex optimization problem at hand, and 
capitalizes on a novel convex majorizer. The proposed majorizer 
is endowed with several desirable properties and represents a 
key contribution of this work. It is a more accurate match to the 
underlying nonconvex cost function than popular MM quadratic 
majorizers, and is readily amenable to distributed minimization 
via the alternating direction method of multipliers (ADMM). 
Moreover, it allows for low-complexity, fast Nesterov gradient 
methods to tackle the ADMM subproblems induced at each 
node. Computer simulations show that DCOOL-NET achieves 
comparable or better sensor position accuracies than a state-of- 
art method which, furthermore, is not parallel. 

Index Terms — Distributed algorithms, majorization- 
minimization, non-convex optimization, majorizing function, 
distributed iterative sensor localization, sensor networks. 

EDICS Category: SEN-DIST SEN-COLB 



I. Introduction 

Applications of sensor networks in environmental and in- 
frastructure monitoring, surveillance, or healthcare {e.g., body 
area networks, infrastructure area networks, as well as staff and 
equipment localization [ 1 ]), typically rely on known sensor 
nodes' positions, even if the main goal of the network is not 
localization. A sensor network usually comprises a large set 
of miniature, low cost, low power autonomous sensor nodes. 
In this scenario it is generally unsuitable or even impossible 
to accurately deploy all sensor nodes in a predefined location 
within the network monitored area. GPS is also discarded as 
an option for indoor applications or due to cost and energy 
consumption constraints. 

Centralized methods: There is a considerable body of work 
on centralized sensor network localization. These algorithms 
entail a central or fusion processing node, to which all sensor 
nodes communicate their range measurements. The energy 
spent on communications can degrade the network operation 
lifetime, since centralized architectures are prone to data 
traffic bottlenecks close to the central node. Resilience to 
failure, security and privacy issues are, also, not naturally 
accounted for by the centralized architecture. Moreover, as 
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the number of nodes in the network grows, the problem to 
be solved at the central node becomes increasingly complex, 
thus raising scalability concerns. Focusing on recent work, 
several different approaches are available, such as J2|, where 
sensor network localization is formulated as a regression 
problem over adaptive bases, or [3], where the strategy is to 
perform successive minimizations of a weighted least squares 
cost function convolved with a Gaussian kernel of decreasing 
variance. Another successfully pursued approach is to perform 
semi-definite (SDP) or second order cone (SOCP) relaxations 
to the original non-convex problem HI, (5). In BJ and %\ the 
majorization-minimization (MM) framework was used with 
quadratic cost functions to derive centralized approaches to the 
sensor network localization problem. Finally, another widely 
used methodology relies on multidimensional scaling (MDS), 
where the sensor network localization problem is posed as a 
least-squares problem, as in (7), |8). 

Distributed methods: Distributed approaches for sensor 
network localization, requiring no central or fusion node, 
have been less frequent, despite the more suited nature of 
this computational paradigm to the problem at hand and 
the discussed advantages in applications. The work in (5J 
proposes a parallel distributed algorithm. However, the sensor 
network localization problem adopts a discrepancy function 
between squared distances which, unlike the ones in maximum 
likelihood (ML) methods, is known to greatly amplify mea- 
surement errors and outliers. The convergence properties of the 



algorithm are not studied theoretically. The work in [10] also 
considers network localization outside a ML framework. The 
approach proposed in flO) is not parallel, operating sequen- 
tially through layers of nodes: neighbors of anchors estimate 
their positions and become anchors themselves, making it 
possible in turn for their neighbors to estimate their positions, 
and so on. Position estimation is based on planar geometry- 
based heuristics. In [ 11 j, the authors propose an algorithm with 
assured asymptotic convergence, but the solution is computa- 
tionally complex since a triangulation set must be calculated, 
and matrix operations are pervasive. Furthermore, in order to 
attain good accuracy, a large number of range measurement 
rounds must be acquired, one per iteration of the algorithm, 
thus increasing energy expenditure. On the other hand, the 
algorithm presented in fl2j and based on the non-linear Gauss 
Seidel framework, has a pleasingly simple implementation, 
combined with the convergence guarantees inherited from 
the non-linear Gauss Seidel framework. Notwithstanding, this 
algorithm is sequential, i.e., nodes perform their calculations 
in turn, not in a parallel fashion. This entails the existence 
of a network-wide coordination procedure to precompute the 
processing schedule upon startup, or whenever a node joins or 
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leaves the network. 

Contribution: In the present work, we set forth a distributed 
algorithm, termed DCOOL-NET, for sensor network local- 
ization, in which all nodes work in parallel and collaborate 
only with single-hop neighbors. DCOOL-NET is obtained by 
casting the localization problem in a principled ML framework 
and following a majorization-minorization approach (TT3l to 
tackle the resulting nonconvex optimization problem. 

MM paradigm operates on a novel convex majorization 
function which we tailored to match the particular nonconvex 
structure in the cost function. The proposed majorizer is a 
much tighter approximation than the popular MM quadratic 
majorizers, and enjoys several optimization-friendly proper- 
ties. It is readily amenable to distributed minimization via 
the alternating direction method of multipliers (ADMM), 
with convergence guarantees. Moreover, by capitalizing on 
its special properties, we show how to setup low-complexity 
algorithms based on the fast-gradient Nesterov methods to 
address the ADMM subproblems at each node. 

As in other existing distributed iterative approaches, there 
is no theoretical guarantee a priori that DCOOL-NET will 
find the global minimum of the nonconvex cost function for 
any given initialization (even in the more favorable centralized 
setting, we are not aware of theoretical proofs establishing 
global convergence of existing methods for tackling ([TJ). 
However, computer simulations show that our approach yields 
comparable or better sensor position accuracies, for the same 
initialization, than the state-of-art method in | [T2[ which, like 
DCOOL-NET, is easily implementable and ensures descent of 
the cost function at each iteration, but, contrary to DCOOL- 
NET, it is not a parallel method. 

II. Problem statement 

The sensor network is represented as an undirected con- 
nected graph Q = (V, £). The node set V = {1,2, ...,n} 
denotes the sensors with unknown positions. There is an edge 
i ~ j € £ between sensors i and j if and only if a noisy range 
measurement between nodes i and j is available at both of 
them and nodes i and j can communicate with each other. The 
set of sensors with known positions, hereafter called anchors, 
is denoted by A = {1, . . . , m}. For each sensor i 6 V, we let 
Ai C A be the subset of anchors (if any) relative to which 
node % also possesses a noisy range measurement. 

Let W be the space of interest (p = 2 for planar networks, 
and p — 3 otherwise). We denote by Xi € W the position 
of sensor i, and by dij the noisy range measurement between 
sensors i and j, available at both i and j. Following fTS) , we 
assume = cL, ||] Anchor positions are denoted by £ MP. 
We let rik denote the noisy range measurement between sensor 
i and anchor k, available at sensor i. 

The distributed network localization problem addressed in 
this work consists in estimating the sensors' positions x = 
{xi : i € V}, from the available measurements {dij : i ~ 
j} U {rik '■ i € V, k € Ai}, through collaborative message 

'This entails no loss of generality: it is readily seen that, if dij ^ dji, then 
it suffices to replace dij 4— (dij + dn)/2 and dji <— (dij + dji)/2 in the 
forthcoming optimization problem {TV. 



passing between neighboring sensors in the communication 
graph g. 

Under the assumption of zero-mean, independent and 
identically-distributed, additive Gaussian measurement noise, 
the maximum likelihood estimator for the sensor positions is 
the solution of the optimization problem 

minimize/ (x), (1) 

X 

where 

/0) = ^i\\xi-Xj\\ - dij) 2 (W x i~ a k\\ - rik) 2 - 

Problem ([TJ is non-convex and difficult to solve. Even in 
the centralized setting (i.e., all measurements are available at 
a central node) currently available iterative techniques don't 
claim convergence to the global optimum. Also, even with 
noiseless measurements, multiple solutions might exist due 
to ambiguities in the network topology itself. For noiseless 
unambiguous topologies, also called localizable networks, the 
measurement noise may create ambiguities. However, recent 
studies like |14| provide theoretical guarantees to confidently 
consider networks which are approximately localizable, i.e., 
whose localizability resists to a reasonable amount of mea- 
surement noise. 



III. DCOOL-NET 

We propose a distributed algorithm, termed DCOOL-NET, 
to tackle problem ([TJ. Starting from an initialization x[0] for 
the unknown sensors' positions x, DCOOL-NET generates 
a sequence of iterates (x[/]) ;>1 which, hopefully, converges 
to a solution of ([TJ. Conceptually, DCOOL-NET is obtained 
by applying the majorization minimization (MM) framework 
(l3j to ([TJ: at each iteration I, a majorizer of /, tight at the 
current iterate x[l], is minimized to obtain the next iterate 
x[l + 1]. The algorithm is outlined in Algorithm [T] for a fixed 
number of iterations L. Here, F(- \x[l]) denotes a majorizer 



Algorithm 1 DCOOL-NET 
Input: x[Q] 
Output: x[L] 

l: for 1 = to L - 1 do 

2: x[l + 1] = argmin x F[x \ x\l\) 

3: end for 

4: return x[L] 



of / (i.e., f(x) < F(x\x[l]) for all x) which is tight at 
x[l] (i.e., f(x[l]) — F(x[l]\ x[l])). The majorizer is detailed 
in the next section. Note that f(x[l + 1]) < f(x[l]), that is, 
/ is monotonically decreasing along iterations, an important 
property of the MM framework. 

DCOOL-NET is a distributed algorithm because, as we shall 
see, the minimization of the upper-bounds F can be achieved 
in a distributed manner. 
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IV. Majorization function 

Commonly, MM techniques resort to quadratic majorizers 
which, albeit easy to minimize, show a considerable mismatch 
with most cost functions (in particular, with / in ([T|). To 
overcome this problem, we introduce a key novel majorizer. It 
is specifically adapted to /, tighter than a quadratic, convex, 
and easily optimizable. 

Before proceeding it is useful to rewrite ([T| as 

where f^x^Xj) = (j) dij (xi-Xj) and fik{xi) = (j> rik (xi-a k ), 
both defined in terms of the basic building block 

<t> d {u) = (H| -df. (2) 




Fig. 1. Nonconvex cost function (black, dash point) in j2j against the 
proposed majorizer (red, solid) in J5J and a vanilla quadratic majorizer (blue, 
dashed) in J5j, for d = 0.5 and v = 0.1. The proposed convex majorizer is 
a much more accurate approximation. 



A. Majorization function for |2} 

Let v £ MP be given, assumed nonzero. We provide a 
majorizer | u) for (f> d in |2]) which is tight at v, i.e., 

4>d{u) < ®d(u | v) for all u and <j> d (v) = | v). 
Proposition 1: Let 

®d(u\v) = max {gd(u), h d (v T u/ \\v\\ - d)} , (3) 

where 

g d (u) = {\\u\\ - df + , (4) 
( r )+ = (max{0, r}) 2 , and 



h R (r) = 




is the Huber function of parameter R. Then, the function 
| v) is convex, is tight at v, and majorizes 4> d . 
Proof: See Appendix |A| ■ 
The tightness of the proposed majorization function is illus- 
trated in Fig. [T] in which we depict, for an one-dimensional 
argument u, d = 0.5 and v = 0.1: the nonconvex cost function 
in pi, the proposed majorizer in ([3]) and a quadratic majorizer 

Q d {u\v) = \\ u \\ 2 + d 2 ~2d V -^, (6) 

IMI 

obtained through routine manipulations of (|2j, e.g., expanding 
the square and linearizing ||u|| at v, which is common in 
MM approaches (c.f. Q, (6|| for quadratic majorizers applied 
to the sensor network localization problem and p3] for an 
application in robust MDS). Clearly, the proposed convex 
majorizer is a better approximation to the nonconvex cost 
function^] As an expected corollary, it also outperforms in 
accuracy the quadratic majorizer when embedded in the MM 
framework, as shown in the experimental results of Sec. [X] 

2 The fact that both majorizers have coincident minimum is an artifact of 
this toy example, and does not hold in general. 



B. Majorization function for ([TJ 

Now, for given x[l], consider the function 

F(x\x[l])=J2 

i~j i fcs^ti 

where 

Fij(xi,Xj) = <& d%J {xi - Xj | Xi[l] - Xj[l]) (8) 

and 

Fik(xi) = $ rik (xi - a k | Xi[l] - a k ). (9) 

Given Proposition [T[ it is clear that it majorizes / and is tight 
at x[l]. Moreover, it is convex as a sum of convex functions. 

V. Distributed minimization 

At the l-th iteration of DCOOL-NET, the convex function 
in (|7]i must be minimized. We now show how this optimization 
problem can be solved collaboratively by the network in a 
distributed, parallel manner. 

A. Problem reformulation 

In the distributed algorithm, the working node will operate 
on local copies of the estimated positions of its neighbors and 
of itself. So, it is convenient to introduce new variables. Let 
Vj = {j : j ~ i] denote the neighbors of sensor i. We 
also define the closed neighborhood V, — V% U {i}. For each 
i, we duplicate xi into new variables yji, j E Vj, and Zik, 
k e Ai- This choice of notation is not fortuitous: the first 
subscript reveals which physical node will store the variable, 
in our proposed implementation; thus, xi and z ik are stored 
at node i, whereas yji is managed by node j. We write the 
minimization of <jvj as the optimization problem 

minimize F(y, z) 

subject to y 3i = x t , j e V t (10) 
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where y = {y Jt : i £ V, j £ Vi}, z = {z ik : i £V,k £ Ai}, 
and 

F(V, Z )=Y1 ( Fi i Vv) + F v fe' ya)) + 

+2j2J2 F ^( z ik)- en) 

i keAi 

In passing from |7} to (jTOf we used the identity F^ (x, ,Xj) = 

^Fij(yu,yij) + sFij (yjiiVjj), due to Vji = x i- A l so , for 
convenience, we have rescaled the objective by a factor of 
two. 

B. Algorithm derivation 
Problem ( fTO} is in the form 

minimize F(y, z) + G{x) 
subject to A(y, z) + Bx = 

where F is the convex function in ( fTTj ), G is the identically 
zero function, A is the identity operator and B is a matrix 
whose rows belong to the set {— ej , i £ V}, being e< the ith 
column of the identity matrix of size |V|. In the presence of 
a connected network B is full column rank, so the problem 
is suited for the Alternating Direction Method of Multipliers 
(ADMM). See [16] and references therein for more details on 
this method. See also (T7)-j55) for applications of ADMM in 
distributed optimization settings. 

Let Xji be the Lagrange multiplier associated with the 
constraint y^ = x. t and A = {Xji} the collection of 
all such multipliers. Similarly, let ^ be the Lagrange 
multiplier associated with the constraint z.ik = x. L and 
P = {Mifc}- The ADMM framework generates a sequence 
(y(*)>2(*)> a (*)j M(*))t>i such that 

(y(t + l),z(t + 1)) = argmin£ p (y, z, x(t), X(t), p(t)) 

V,* 

(13) 

x(t + 1) = argmin C p (y(t + l),z(t + l),x, X(t),fi(t)) 

X 

(14) 

X^t + 1) = X^t) + p{y 3l {t + 1) - Xi {t + 1)) (15) 
lHk{t + 1) = IMk(t) + p(z lk {t + 1) - Xi(t + 1)), (16) 
where C p is the augmented Lagrangian defined as 

Cp(y,z,x,X,fi) = F(y,z) + ^2^2 (^JiiVji ~ x%)+ 

| \\Vji ~ Xif ) + (vlkizik - Xj) + | \\z ik - x,\\ 2 ^ . 

i keAi 

(17) 

Here, p > is a pre-chosen constant. 

In our implementation, we let node i store the variables 
Xi, yij, Xij, Xji, for j £ Vi and z lk , p ik , for k £ Ai. Note 
that a copy of Ay is maintained at both nodes i and j (this 
is to avoid extra communication steps). For t = 0, we can 
set A(0) and p(0) to a pre-chosen constant (e.g., zero) at all 
nodes. Also, we assume that, at the beginning of the iterations 
(i.e., for t — 0) node i knows Xj(0) for j £ Vi (this can 



be accomplished, e.g., by having each node i communicating 
Xi(Q) to all its neighbors). This property will be preserved for 
all i > 1 in our algorithm, via communication steps. 

We now show that the minimizations in ( fT3] l and ( p"4| ) 
can be implemented in a distributed manner and require low 
computational cost at each node. 

VI. ADMM: SOLVING PROBLEM {13} 

As shown in Appendix |B| the augmented Lagrangian in ( fTT] ) 
can be written as 

C p (y, z, x, A, fx) = ^2 ^ dj {y ii i Uij i %j i ^ij ) ~l~ 
* VieVi 

+ ^2 Ci k {z lk ,Xi,p, ik ) J (18) 

where 

f-Hj (yui Uij i x ji ^ij) = Fij (yu, yij) + Ajj (yij — Xj) + 
+ | \\Vij ~ x j\\ 2 

and 

Cik (zik, Xi, p ik ) = 2F ik (z ik ) + pj k (z ik - Xi) + 
, P ii ii 2 

+ 2 W Z ik ~ Xi\\ ■ 

In {[8| we let F u = 0. It is clear from (|T8j that Problem ( fT3| ) 
decouples across sensors i £ V, since we are optimizing only 
over y and z. Further, at each sensor i, it decouples into two 
types of subproblems: one involving the variables yij,j £ Vi, 
given by 

minimize V" £y (yu, .'/<.;• A,, i , (19) 

and into \A\\ subproblems of the form 

minimize L lk (z ik , x i: fj, ik ) , (20) 

involving the variable Zi k , k £ Ai- Note that problems related 
with anchors are simpler, and, since there are usually few 
anchors in a network, they do not occur frequently. 

A. Solving Problem ( fT9] l 

First, note that node i can indeed address Problem (jT9j since 
all the data defining it is available at node i: it stores Xji(t), 
j £ Vi, and it knows Xj(t) for all neighbors j £ Vi (this holds 
trivially for t — by construction, and it is preserved by our 
approach, as shown ahead). 

To alleviate notation we now suppress the indication of 
the working node i, i.e., variable y^ is simply written as yj. 
Problem ([19} can be written as 

minimize V [F^y^yj) + -||% - 7ij || 2 ) 
w.iev, f-J. V 2 J 

+ fll^-7«H 2 , (2D 

where = Xj - -jf. 
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We make the crucial observation that, for fixed yi, the 
problem is separable in the remaining variables yj, j 6 Vj. 
This motivates writing pTj ) as the master problem 



minimize H(yi) = V" ffy(j/t) + - 7« 

jev, 

where 



#»j (yO = min (j/i » yj ) + o 1 1 yj 

Vj I 



(22) 



(23) 



We now state important properties of .Hy. 
Proposition 2: Define as in §23\ . Then: 

1) Optimization problem ( |23] l has a unique solution j/j for 
any given yi, henceforth denoted y*{yi); 

2) Function Hij is convex and differentiable, with gradient 



VH i:j {yi) = p (y*(yi) - 7y) ; 



(24) 



3) The gradient of Hij is Lipschitz continuous with param- 
eter p, i.e., 

||VJr tf (u)-V.Hy(v)|| <p\\u-v\\ 

for all u,v € W. 
Proof: 

1) Recall from (|8) that F,y(yj, %) = <J> rf (y 4 - % | u) where 
d = dij and v — xi[l] — Xj[l]. We have Hij(yi) = 



0(2/i - 7«) where 

9(w) = min$d(w | 



^ II II 2 

^\\ U -M\ ■ 



(25) 



Moreover, u* solves ( |25] l if and only if y* = yi — u* 
solves ( |23| ). Now, the cost function in <j25j is clearly 
continuous, coercive (i.e., it converges to +oo as ||u|| — > 
+oo) and strictly convex, the two last properties arising 
from the quadratic term. Thus, it has an unique solution; 
2) The function 9 is the Moreau-Yosida regularization of 
the convex function $>d(-\v) |23] XI.3.4.4]. As 9 is 
known to be convex and is the composition of 9 
with an affine map, H L j is convex. It is also known that 
the gradient of 9 is 



V9(w) = p(w - u*{w)) 

where u*(w) is the unique solution of 
Thus, 



for a given w. 



vffyte) = ve(i/ i -7 tf ) 

= p(yi-Hj-u*(y i -'Yij)). 

Unwinding the change of variable, i.e., using y*{yi) — 
Vi - v*(y l - jij), we obtain ((24]); 
3) Follows from the well known fact that the gradient of 
9 is Lipschitz continuous with parameter p. 

m 

As a consequence, we obtain several nice properties of the 
function H. 

Theorem 3: Function H in (|22| is strongly convex with 



parameter p, i.e., H — f ||-|| is convex. Furthermore, it is 
differentiable with gradient 

VH( Vi ) =pJ2 (Vj(Vt) ~ Hj) + P{y l ~ 7«)- ( 26 ) 



The gradient of H is Lipschitz continuous with parameter 
L H =p(\Vi\ + l). 

Proof: Since H is a sum of convex functions, it is 
convex. It is strongly convex with parameter p due to the 
presence of the strongly convex term | — jn\\ 2 . As a sum 
of differentiable functions, it is differentiable and the given 
formula for the gradient follows from proposition [2] Finally, 
since H is the sum of \V%\ + 1 functions with Lipschitz 
continuous gradient with parameter p , the claim is proved. 

■ 

The properties established in Theorem [3] show that the 
optimization problem ( [22] ) is suited for Nesterov's optimal 
method for the minimization of strongly convex functions 
with Lipschitz continuous gradient p4] Theorem 2.2.3]. The 
resulting algorithm is outlined in Algorithm |2] which is 
guaranteed to converge to the solution of d22|. 



Algorithm 2 Nesterov's optimal method for (22j 



fc(0)=lft(0) 
for s > do 

y l (s + l) = yi(s) 

&(s + l) = yi (s + l) + 
end for 



(Vi(s + 1) - Vi(s)) 



B. Solving problem \23) 

It remains to show how to solve ( |23] l at a given sensor 
node. Any off-the-shelf convex solver, e.g. based on interior- 
point methods, could handle it. However, we present a simpler 
method that avoids expensive matrix operations, typical of 
interior point methods, by taking advantage of the problem 
structure at hand. This is important in sensor networks where 
the sensors have stringent computational resources. 

First, as shown in the proof of Proposition 2\ it suffices to 
focus on solving ( |25] l for a given w: solving (23]) amounts to 
solving ( |2"5] > for w = yi — 7^ to obtain u* = u*(w) and set 

VjiVi) = Vi~ u *- 

Note from |3]) that < & c i(-|w) only depends on vj so we 
can assume, without loss of generality, that |jw|| = 1. 

From (|3]l, we see that Problem ( pB} can be rewritten as 



minimize 
subject to 



r + <k \\u — w\\ 
9d(u) < r 

„T 



(27) 



h d (v u - d) <r, 
with optimization variable (u,r). The Lagrange dual (c.f., for 



example, |23|) of d27b is given by 



maximize 
subject to 



< cj < 1, 



(28) 



where ip(oj) = inf{ , 5(a;,M) : u E 



P 11 

- \\u - 

2 11 



i n } and 

-(l~uj)h d (v T u-d). (29) 



We propose to solve the dual problem ( |28] l, which involves 
the single variable u, by bisection: we maintain an interval 
[a, b] C [0, 1] (initially, they coincide); we evaluate ip(c) at 
the midpoint c — (a + b)/2; if ip(c) > 0, we set a = c, 
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otherwise, b = c; the scheme is repeated until the uncertainty 
interval is sufficiently small. 

In order to make this approach work, we must prove first 
that the dual function tp is indeed differentiable in the open 
interval fl = (0, 1) and find a convenient formula for its 
derivative. We will need the following useful result from 
convex analysis. 

Lemma 4: Let X C K n be an open convex set and Y C 
R p be a compact set. Let F : X x Y — > R. Assume that 
F(x, •) is lower semi-continuous for all x G X and F(-, y) is 
concave and differentiable for all y £ Y. Let / : X — >• M, 
f(x) = m£{F(x,y) : y G Y}. Assume that, for any x G X, 
the infimum is attained at an unique y*(x) G Y. Then, / is 
differentiable everywhere and its gradient at x G X is given 

by 

Vf(x) = VF(x,y*(x)) (30) 

where V refers to differentiation with respect to x. 

Proof: This is essentially p3| VI.4.4.5], after one changes 
concave for convex, lower semi-continuous for upper semi- 
continuous and inf for sup. ■ 

Now, view 'J in ( |29] l as defined in £1 x W 1 . Is is clear that 
&(uj, •) is lower semi-continuous for all w (in fact, continuous) 
and ty(-,u) is concave (in fact, affine) and differentiable for 
all u. In fact, some even nicer properties hold. 

Lemma 5: Let to G CI, The function \P W = ^>(u>,-) is 
strongly convex with parameter p and differentiable every- 
where with gradient 

V*o,(ti) = p(u-w) + 2uj(u-Tr(u)) + (l-uj)h d (v T u-d)v 1 

(31) 

where ir(u) denotes the projection of u onto the closed ball 
of radius d centered at the origin. Furthermore, the gradient 
of is Lipschitz continuous with parameter p + 2. 

Proof: We start by noting that gd in Q can be written as 
9d( u ) = <^c( u ) wnere C is the closed ball with radius d cen- 
tered at the origin, and dc denotes the distance to the closed 
convex set C. It is known that gd is convex, differentiable, the 
gradient is given by Vgd(u) = 2(u — n(u)) and it is Lipschitz 
continuous with parameter 2 |23] X.3.2.3]. Also, function hd 
in |5]) is convex and differentiable. Thus, the function is 
convex (resp. differentiable) as a sum of three convex (resp. 
differentiable) functions. It is strongly convex with parameter 
p due to the first term § ||- — w\\ 2 . The gradient in pTj ) is 
clear. Finally, from |/id(r)| — hd(s)\ < 2|r — s\ for all r, s, 
there holds for any u\,u-i, 



Lemma 6: Function t/j in ( |28| > is differentiable and its 
derivative is 



\hd{v T ui~d)-h d (v T u 2 -d)\ < 2\v T (ui~u 2 )\ < 2||ui-u 2 ||, 

where ||u|| = 1 and the Cauchy-Schwarz inequality was used 
in the last step. We conclude from ( f3T| > that, for any mi,M2, 

||V* u («i) - V* w (m 2 )|| < {p + 2uj + 2(1 - w)) ||ui - ua|| , 

i.e., the gradient of ^ u is Lipschtz continuous with parame- 
ter p + 2. m 
Using Lemma [5] we see that the infimum of ^ is attained 
at a single since it is a continuous, strongly convex 

function. The derivative of ip in ( [2~8] i relies on u*(oS), as seen 
in Lemma [6] 



ip(uj) = g d (u*(u)) - h d (v T u*(u) - d) 



(32) 



Proof: We begin by bounding the norm of u*(lu). From 
the necessary stationary condition V v I' a j(u*(w)) = and ( |3T| ) 
we conclude 

(p+2oj)u k (u) = pw+2uTi{u*{(jj)) — {l—Lj)hd{v T u k {u:)—d)v. 

(33) 

Since \h d (t)\ < 2d for all t (see (|5)), ||7r(u)|| < d for all u, 
\\v\\ = 1, and < lj < 1, we can bound the norm of the 
right-hand side of ( |33| ) by p ||w|| + 4d. Thus, 



i 



(p\\w\\ +4d) 



p + 2u> 

< -(p\\w\\+M) 
P 

= IMI + — • 
P 

Introduce the compact set U = {u G K." : ||w|| < \\w\\ + 
4:d/p}. The previous analysis has shown that the dual function 
in ( |28j ) can also be represented as ip(u)) = mi{^>(uj, u) : u G 
[/}, i.e., we can restrict the search to U and view ^ as defined 
in Q x U. We can thus invoke Lemma [4] to conclude that ip 
is differentiable and ( (32| ) holds. ■ 
Finding u*(u): To obtain we must minimize \E , W . 

But, given its properties in Lemma |5J the simple optimal 
Nesterov method, described in Algorithm [2] is also applicable 
here. 



C. Solving problem ( |20[ ) 

Note that node i stores Xi(t) and pik(t), k G Ai- Thus, 
it can indeed address Problem (20]). Problem ( p0| ) is similar 
(in fact, much simpler) than ([19 1, and following the previous 
steps leads to the same Nesterov's optimal method. We omit 
this straightforward derivation. 

VII. ADMM: SOLVING PROBLEM ( fT4l > 

Looking at ( [17) , it is clear that Problem fB) decouples 
across nodes also. Furthermore, at node i a simple uncon- 
strained quadratic problem with respect to Xi must be solved, 
whose closed-form solution is 



Xi(t + 1) = j= 



X] (-WfcW + ^fc(* + l)) J • ( 34 ) 



For node i to carry this update, it needs first to receive 
yji(t +1) from its neighbors j G Vj. This requires a 
communication step. 
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VIII. ADMM: IMPLEMENTING {15} AND ( fl6) l 

Recall that the dual variable Xji is maintained at both nodes 
i and j. Node i can carry the update Xji(t + 1) in ( |15) , for 
all j e Vi, since the needed data are available (recall that 
Uji(t + 1) is available from the previous communication step). 
To update Ay(f + 1) = \ij(t) + p{Vij{t + 1) — Xj(t + 1), 
node i needs to receive Xj(t +1) from its neighbors j e V-j. 
This requires a communication step. 



IX. Summary of the distributed algorithm 

Our ADMM-based algorithm currently stops after a fixed 
number of iterations, denoted T. Algorithm [3] outlines the 

Algorithm 3 Step [2] of DCOOL-NET: position updates 
Input: x[l] 
Output: x[l + l] 

l: for t = to T 1 do 

2: for each node i E V in parallel do 

3: Solve Problem ([19]) by minimizing H in ( |22| i with 
Alg. [5]to obtain yy(f + 1), j G V, 

4: for A; = 1 to |A| do 

5: Solve Problem ((20]) to obtain z lk (t + 1) 

6: end for 

7: Send j/y(t + 1) to neighbor j € Vj 
8: Compute (i + 1) from ( [34] ) 
9: Send + 1) to all j g V, 

10: Update {Aj-i(< + 1), A*ife(* + 1), j € V i; fc € A} as 

in {B} and (16) 
li: end for 
12: end for 

13: return sc[/ + 1] = x(T) 



procedure derived in Sees. [V] [VI] and [VTl] and corresponds to 
step [2] of DCOOL-NET (Algorithm [T}. Note that, in order to 
implement step [5] of Algorithm [3] one must adapt Algorithm [2] 
to the problem at hancj^j 

A. Communication load 

Algorithm [3] shows two communication steps: step [JJ and 
step [9] At step [JJ each node i sends |V,-| vectors in MP, 
each to one neighboring sensor, and at step [9] a vector in 
W is broadcast to all nodes in Vi- This results in 2TL|Vi| 
communications of M p vectors for node i for the overall 
algorithm DCOOL-NET. When comparing with SGO in fl2) , 
for T iterations, node i sends T|V»| vectors in M p . The increase 
in communications is the price to pay for the parallel nature 
of DCOOL-NET. 

B. Initialization 

We note that the tools introduced so far could, in principle, 
also be used to set up a distributed initialization scheme 
(i.e., to generate x[0]). More precisely, by following the same 
steps leading to the ADMM formulation, a convex function 

3 In the spirit of reproducible research, all our MATLAB code will be made 
available online. 



structured as (7J is amenable to distributed optimization. 
For example, the ESOCP relaxation, used for initialization 
in fl2) could fall into this template. We leave a more in-depth 
exploration of this issue to future work. 

X. Experimental results 
A. General setup 

Unless otherwise specified, the generated geometric net- 
works are composed by 4 anchors and 50 sensors, with 
an average node degree, i.e., ryr J^ieV of about 6. In 
all experiments the sensors are distributed at random and 
uniformly on a square of 1 x 1, and anchors are placed, unless 
otherwise stated, at the four corners of the unit square (to 
follow (12)), namely, at (0,0), (0,1), (1,0) and (1,1). These 
properties require a communication range of about R = 0.24. 
Since localizability is an issue when assessing the accuracy 
of sensor network localization algorithms, the used networks 
are first checked to be generically globally rigid, so that a 
small disturbance in measurements does not create placement 
ambiguities. To detect generic global rigidity, we used the 
methodologies in fl4] Sec. 2]. The results for the proposed 
algorithm DCOOL-NET consider L = 40 MM iterations, 
unless otherwise stated. 

Measurement noise: Throughout the experiments, the noise 
model for range distance measurements is 

(Uj = \\xi - Xj\\ ■ I ray I, nk = \\xi - a k \\ ■ \n ik \, (35) 

where ray, ~ Af(l,a 2 ) are independent identically dis- 
tributed Gaussian random variables with mean value 1 and 
standard deviation a (specified ahead). 

This is the same model used in ]12[ , against which we 
compare our algorithm. Our choice of benchmark is motivated 
by simulation results in p2[ showing that the method is 
more accurate than the one proposed in (9). As discussed 
in Section [II] and without loss of generality, we assume ray 
affects the edge measurement as a whole, i.e., both dy and 
dji correspond to the same noisy measurement. 

Initialization noise: To initialize the algorithms we take the 
true sensor positions x* = {x* : i e V} and we perturb them 
by adding independent zero mean Gaussian noise, according 
to 

Xi[0] = x* + 7 ?i , (36) 

where 77,; ~ Af(Q, (J? nit I p ) and I p is the identity matrix of size 
p x p. The parameter <7i„it is detailed ahead. 

Accuracy measure: To quantify the precision of both al- 
gorithms we use Root Mean Square Error (RMSE), defined 
as 



RMSE 



1 



MC 

V SE„ 



where 



SE m — 



(37) 



(38) 



denotes the network-wide squared error obtained at the ?rath 
Monte Carlo trial, MC is the number of Monte Carlo trials 
and x m is the estimate of the sensors' positions at the rrath 
Monte Carlo trial. We emphasize that Equation ( [37] ) gives an 
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Initialization noise intensity (tTinit) 



Fig. 2. RMSE vs. o" nl ; t , the intensity of initialization noise in (36) . The 
range measurements are noiseless: a = in j35). Anchors are at the unit 
square comers. The proposed majorizer (red, solid) outperforms the quadratic 
majorizer (blue, dashed) in accuracy. 

accuracy measure per sensor node in the network, and not 
the overall RMSE. This is useful, not only to compare the 
performance between networks of different sizes, but also to 
be able to extract a physical interpretation of the quantity. 

B. Proposed and quadratic majorizers: RMSE vs. initializa- 
tion noise 

We compare the performance of our proposed majorizer 
in |7) with a standard one built out of quadratic functions, 
e.g., the one used in Q. We have submitted a simple source 
localization problem with one sensor and 4 anchors to two 
MM algorithms, each associated with one of the majorization 
functions. They ran for a fixed number of 30 iterations. At each 
Monte Carlo trial, the true sensor positions were corrupted 
by zero mean Gaussian noise, as in ( (36) , with standard 
deviation a- m it € [0.01, 1]. The range measurements are taken 
to be noiseless, i.e., a — in ( |35| ), in order to create an 
idealized scenario for direct comparison of the two approaches. 
The evolution of RMSE as a function of initialization noise 
intensity is illustrated in Fig. [2] There is a clear advantage 
of using this majorization function when the initialization is 
within a radius of the true location which is 30% of the square 
size. 

C. DCOOL-NET and SGO: RMSE vs. initialization noise 

Two sets of experiments were made to compare the RMSE 
performance of SGO in |12| and the proposed DCOOL-NET, 
as a function of the initialization quality (i.e., <7 init in ([36])). 
In the first set, range measurements are noiseless (i.e., a = 
in (|35)), whereas in the second set we consider noisy range 
measurements (a > 0). 

1 ) Noiseless range measurements: In this setup 300 Monte 
Carlo trials were run. As the measurements are accurate 
(a = in ( |35) ) one would expect not only insignificant 
values of RMSE, but also a considerable agreement between 
all the Monte Carlo trials on the solution for sufficiently close 
initializations. Fig. [3] confirms that both DCOOL-NET and 




0.05 0.1 0.15 0.2 0.25 0.3 



Initialization noise intensity (o"i n it) 

Fig. 3. RMSE vs. o"j n ; t , the intensity of initialization noise in (36) . The 
range measurements are noiseless: a = in (35) . Anchors are at the unit 
square corners. Proposed DCOOL-NET (red, solid) and SGO (blue, dashed) 
attain comparable accuracy. 

TABLE I 

Squared error dispersion over Monte Carlo trials for Fig. [5] 



Omit 


DCOOL-NET 


SGO 


0.01 


0.0002 


0.0007 


0.10 


0.0638 


0.1290 


0.30 


0.2380 


0.3400 




0.05 0.1 0.15 0.2 0.25 0.3 



Initialization noise intensity (<Ti n i t ) 

Fig. 4. RMSE vs. (7j n it, the intensity of initialization noise in (36) . The 
range measurements are noisy: a = 0.12 in (35) . Anchors are at the unit 
square corners. Proposed DCOOL-NET (red, solid) outperforms SGO (blue, 
dashed) in accuracy. 

SGO achieve small error positions, and their accuracies are 
comparable. As stated before, SGO also has a low computa- 
tional complexity. In fact, lower than DCOOL-NET (although 
DCOOL-NET is fully parallel across nodes, whereas SGO 
operates by activating the nodes sequentially, implying some 
high-level coordination). Tab. [I] shows the squared error dis- 
persion over all Monte Carlo trials, i.e., the standard deviation 
of the data {SE m : m = 1, . . . , MC}, recall ((38), for both 
algorithms. We see that DCOOL-NET exhibits a more stable 
performance, in the sense that it has a lower squared error 
dispersion. 

2) Noisy range measurements: We set a = 0.12 in the noise 
model ((35). Fig. g] shows that DCOOL-NET fares better than 
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TABLE II 

Squared error dispersion over Monte Carlo trials for Fig. [4] 



"init 


DCOOL-NET 


SGO 


0.00 


0.0118 


0.0783 


0.01 


0.0121 


0.0775 


0.10 


0.0727 


0.1610 


0.30 


0.2490 


0.3320 




0.1 0.2 0.3 0.4 0.5 



Initialization noise intensity (crhiit) 

Fig. 5. RMSE vs. 0"i n it, the intensity of initialization noise in (36) . The 
range measurements are noisy: a = 0.12 in J35J. Anchors were randomly 
placed in the unit square. Proposed DCOOL-NET (red, solid) outperforms 
SGO (blue, dashed) in accuracy. 

TABLE HI 

Squared error dispersion over Monte Carlo trials for Fig. \5\ 




0.05 0.1 0.15 0.2 0.25 0.3 



Measurement noise intensity (tr) 

Fig. 6. RMSE vs. <r, the intensity of measurement noise in {35}. No 
initialization noise: o"i n jt = in {36}. Anchors are at the unit square 
corners. Proposed DCOOL-NET (red, solid) outperforms SGO (blue, dashed) 
in accuracy. 

TABLE IV 

Squared error dispersion over Monte Carlo trials for Fig. [6] 



(7 


DCOOL-NET 


SGO 


0.01 


0.0002 


0.0016 


0.10 


0.0177 


0.0688 


0.12 


0.0218 


0.0702 


0.15 


0.0326 


0.0921 


0.17 


0.0394 


0.0993 


0.20 


0.0525 


0.1090 


0.30 


0.1020 


0.1630 



finit 


DCOOL-NET 


SGO 


0.00 


0.0097 


0.0712 


0.01 


0.0099 


0.0709 


0.10 


0.1550 


0.3160 


0.30 


0.4350 


0.8440 


0.50 


0.8330 


1.3000 



and DCOOL-NET performs L = 100 iteration^ Fig. [6] and 
Tab. IV summarize the computer simulations for this setup. 



As before, DCOOL-NET consistently achieves better accuracy 
and stability. 



SGO: the gap between the performances of both algorithms 
is now quite significant. The squared error dispersion over all 
Monte Carlo trials for both algorithms is given in Tab. |TTJ As 
before, we see that DCOOL-NET is more reliable, in the sense 
that it exhibits lower variance of estimates across Monte Carlo 
experiments. 

We also considered placing the anchors randomly within 
the unit square, instead of at the corners. This is a more 
realistic and challenging setup, where the sensors are no longer 
necessarily located inside the convex hull of the anchors. 
The corresponding results are shown in Fig. [5] and Tab. [Til] 
for 250 Monte Carlo trials. Again, DCOOL-NET achieves 
better accuracy. Comparing the dispersions in Tabs. |H| and EH 
also reveals that the gap in reliability between SGO and our 
algorithm is now wider. 

D. DCOOL-NET and SGO: RMSE vs. measurement noise 

To evaluate the sensitivity of both algorithms to the in- 
tensity of noise present in range measurements {i.e., a 
in (|35]l), 300 Monte Carlo trials were run for a = 
0.01,0.1,0.12,0.15,0.17,0.2,0.3. Both algorithms were ini- 
tialized at the true sensor positions, i.e., a- ln it = in ((36}, 



E. DCOOL-NET and SGO: RMSE vs. communication cost 

We assessed how the RMSE varies with the communication 
load incurred by both algorithms. We considered the gen- 
eral setup described in Sec. |X-A| The results are displayed 
in Fig. [7] We see an interesting tradeoff: SGO converges 
much quicker than DCOOL-NET (in terms of communication 
rounds), and attains a lower RMSE sooner. However, DCOOL- 
NET can improve its accuracy through more communications, 
whereas SGO remains trapped in a suboptimal solution. 

F. DCOOL-NET: RMSE vs. parameter p 

The parameter p plays a role in the augmented Lagrangian 
discussed in Sec. [V] and is user-selected. As such, it is 
important to study the sensitivity of DCOOL-NET to this 
parameter choice. For this purpose, we have tested several p 
between 1 and 200. For each choice, 300 Monte Carlo trials 
were performed using noisy measurements and initializations. 
Fig. [8] portrays RMSE against p for L = 40 iterations of 
DCOOL-NET. There is no ample variation, especially for 
values of p over 30, which offers some confidence in the 

4 This is to guarantee that, in practice, DCOOL-NET indeed attained a fixed 
point, but the results barely changed for L = 40. 
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Fig. 7. RMSE versus total number of two-dimensional vectors communicated 
in the network. The range measurements are noiseless: a = in (35) . 
Initialization is noisy: o"i n i t = 0.1 in (36) . Anchors are at the unit square 
corners. Proposed DCOOL-NET (red, solid) outperforms SGO (blue, dashed) 
in accuracy, at the expense of more communications. 



0.065 - 




0.045 

20 40 60 80 100 120 140 160 180 200 

Parameter value (p) 

Fig. 8. RMSE vs. p. The range measurements are noisy: a = 0.05 in (35) . 
Initialization is noisy: <T; mt = 0.1 in (36). Anchors are the unit square corners. 



algorithm resilience to this parameter, a pivotal feature from 
the practical standpoint. However, an analytical approach for 
selecting the optimal p is beyond the scope of this work, and 
is postponed for future research. Note that adaptive schemes 



to adjust p do exist for centralized settings, e.g., [16], but 
seem impractical for distributed setups as they require global 
computations. 

XI. Conclusions 

Sensor network localization based on noisy range measure- 
ment between some pairs of sensors is a current topic of 
great interest, which maps into a difficult nonconvex opti- 
mization problem, once cast in a maximum likelihood (ML) 
framework. Efficient algorithms guaranteed to find the global 
minimum are not known even for the centralized setting. 
In this work, we proposed an algorithm, termed DCOOL- 
NET, for the more challenging distributed setup in which 
no central or fusion node is available: instead, neighbor 
nodes collaboratively exchange messages in order to solve the 
underlying optimization problem. DCOOL-NET stems from a 
majorization-minorization (MM) approach and capitalizes on a 



novel convex majorizer. The proposed majorizer is a main con- 
tribution of this work: it is tuned to the nonconvexities of the 
cost function, which translates into better performance when 
directly compared with traditional MM quadratic majorizers. 
More importantly, the new majorizer exhibits several important 
properties for the problem at hand: it allows for distributed, 
parallel, optimization via the alternating direction method of 
multipliers (ADMM) and for low-complexity solvers based 
on fast-gradient Nesterov methods to tackle the ADMM 
subproblems at each node. DCOOL-NET decreases the cost 
function at each iteration, a property inherited from the MM 
framework, but it is not guaranteed to find the global minimum 
(a theoretical limitation shared by all existing distributed and 
non-distributed algorithms). However, computer simulations 
show that DCOOL-NET achieve better sensor positioning ac- 
curacy than a state-of-art method distributed algorithm which, 
furthermore, is not parallel. 

Appendix A 
Proof of PropositionQ] 

We write $>d(u) instead of <&d(u\v) and we let (x,y) = 
x T y. 

Convexity: Note that g<i is convex as the composition of 
the convex, non-decreasing function {■) 2 y with the convex 
function ||-|| — d. Also, hd((v/\\v\\ , •) — d) is convex as the 
composition of the convex Huber function hd{) with the affine 
map — d. Finally, $,j is convex as the pointwise 

maximum of two convex functions. 

Tightness: It is straightforward to check that 4>d{v) = &d(v) 
by examining separately the three cases ||u|| < d, d < \\v\\ < 
2d and ||v|| > 2d. 

Majorization: We must show that $d(u) > 4>d(u) for all u. 
First, consider |ju|| > d. Then, gd(u) = <j>d{u) and it follows 
ihat$ d (i>) =max{g d (u),hd((v/\\v\\,u}-d)} > 4> d (u). Now, 
consider < d and write u = Ru, where R — \\u\\ < d and 
||u|| = 1. It is straightforward to check that, in terms of R and 
u, we have (j>d(u) = {R — d) 2 and $d(u) = hd(R(v, u) — d), 
where v — v/\\v\\. Thus, we must show that hd(R(v,u) — d) > 
(R — d) 2 . Motivated by the definition of the Huber function hd 
in two branches, we divide the analysis in two cases. 

Case 1: \R(v,u) — d\ < d. In this case, hd(R(v,u) — d) = 
{R(v,u} - d) 2 . Noting that \(v, u) \ < 1, there holds 

(R(v, u) - d) 2 > mi{(Rz - df : \z\ < 1} = (R - d) 2 , 

where the fact that R < d was used to compute the infimum 
over z (attained at z = 1). 

Case 2: \R(v, it) — d\ > d. In this case, hd(R(v, u) — d) = 
2d\R(v,u) -d\- d 2 . Thus, 

h d {R(v, u)-d)>d 2 >(d- R) 2 , 

where the last inequality follows from < R < d. 

Appendix B 
Proof of ( [T8] i 

We show how to rewrite ( fT7| ) as ( flS) . First, note that F(y, z) 
in ( fTTj ) can be rewritten as 

F(y,z) = E F ii(v*i>Vii) E ( 39 ) 
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Here, we used the fact that J/jj) = FjiiVijiVji) which 

follows from dij — dji and $d{u\v) = &d(— u\ — v), see pp. 
In addition, there holds 

x Ji(y^ ~ x i) + ^ \\yji - ^n 2 

= EE \>« - xj) + \ ha - x,f . (40) 

i jeVi 

The first equality follows from interchanging i with j. The 
second equality follows from noting that i 6 Vj if and only 
if j e Vi. Using ([39) and (|40j» in <[l7j gives ([18). 
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